Reduced complexity dynamic programming based on policy iteration
نویسندگان
چکیده
منابع مشابه
On the Complexity of Policy Iteration
Decision-making problems in uncertain or stochastic domains are often formulated as Markov decision processes (MD Ps). Pol icy iteration (PI) is a popular algorithm for searching over policy-space, the size of which is exponential in the number of states. We are interested in bounds on the complexity of PI that do not depend on the value of the discount factor. In this paper we prove the first...
متن کاملUnifying Value Iteration, Advantage Learning, and Dynamic Policy Programming
Approximate dynamic programming algorithms, such as approximate value iteration, have been successfully applied to many complex reinforcement learning tasks, and a better approximate dynamic programming algorithm is expected to further extend the applicability of reinforcement learning to various tasks. In this paper we propose a new, robust dynamic programming algorithm that unifies value iter...
متن کاملAn Efficient Policy Iteration Algorithm for Dynamic Programming Equations
We present an accelerated algorithm for the solution of static Hamilton-JacobiBellman equations related to optimal control problems. Our scheme is based on a classic policy iteration procedure, which is known to have superlinear convergence in many relevant cases provided the initial guess is sufficiently close to the solution. This limitation often degenerates into a behavior similar to a valu...
متن کاملiteration-complexity for cone programming
In this paper we consider the general cone programming problem, and propose primal-dual convex (smooth and/or nonsmooth) minimization reformulations for it. We then discuss first-order methods suitable for solving these reformulations, namely, Nesterov’s optimal method (Nesterov in Doklady AN SSSR 269:543–547, 1983; Math Program 103:127–152, 2005), Nesterov’s smooth approximation scheme (Nester...
متن کاملTemporal Differences - Based Policy Iteration and Applications in Neuro - Dynamic Programming
We introduce a new policy iteration method for dynamic programming problems with discounted and undiscounted cost. The method is based on the notion of temporal differences, and is primarily geared to the case of large and complex problems where the use of approximations is essential. We develop the theory of the method without approximation, we describe how to embed it within a neuro-dynamic p...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: Journal of Mathematical Analysis and Applications
سال: 1992
ISSN: 0022-247X
DOI: 10.1016/0022-247x(92)90007-z